-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of almost fresh builds #8837
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
Locally I get
I think this is a deadlock due to message printing and job running no longer happening on different threads, so job running fills the message channel and waits, but the message won't be processed, as the thread responsible for it is blocked. |
I worked around it for now by removing the backpressure. A better solution would be to immediately process the messages or spawn a single background thread for handling messages. |
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! Would it be possible to split apart the thread-spawn change from the error allocation change? I think the threads bit may take a moment to sort out.
} | ||
scope.spawn(move |_| doit()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was an amount of back and forth about this awhile ago. Originally Cargo spawned threads for everything, but then I eventually optimized it to do what you have here. In #7838 we reverted back to what we have today, but IIRC @ehuss did some measurements and found the cost to be negligible. I can't seem to find a record of that conversation though, @ehuss do you remember that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the PR description of #7838:
On Linux and Windows, it was generally about 0 to 5% slower.
This roughly matches my results.
This PR keeps not buffering the rustc output, but for fresh jobs it directly sends it to the terminal without the indirection through the message queue, thereby slightly improving performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue I believe with this PR is that it can still deadlock. If the message queue is full and a fresh job tries to push onto it then the deadlock happens. That won't happen for stdout but other messages go through the queue as well (like build plans and such).
Another issue which I'm having trouble remembering is that we want the main "event loop" to complete quickly each iteration, and if we're copying a lot of rustc output onto the console that isn't happening. I forget the consequences of a slow loop iteration though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other messages don't use the bounded push_bounded
, but the unbounded push
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did some measurements and found the cost to be negligible. I can't seem to find a record of that conversation though, @ehuss do you remember that?
Just the note in that PR that it was expected to cost about 5%. That number will vary significantly based on the project and the system. I ran some tests with this PR, and it is pretty much the same (no difference on macos, about 5% on Linux).
My main concern from back then with this approach is that it introduces some subtle complexity that didn't really seem to be worth the 5% improvement, but I can't think of any specific problems with this PR other than being harder to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry I forgot about push_bounded
vs push
, makes sense to me!
@ehuss I'm curious, how are you measuring? I'd expect that thread creation on Windows and macOS to be a good deal more expensive than on Linux
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just using hyperfine, like this:
hyperfine -w1 "~/Proj/cargo/target/release/cargo check" "~/Proj/cargo2/target/release/cargo check"
In various projects of various sizes. Larger projects should show a bigger difference.
I just re-ran the test on a larger project (libra, 600+ packages), and the numbers look comparable to linux, so I was probably just comparing a smaller project where the difference was less noticeable.
master:
Time (mean ± σ): 851.4 ms ± 23.5 ms [User: 574.5 ms, System: 247.4 ms]
Range (min … max): 820.7 ms … 897.5 ms 10 runs
improve_perf:
Time (mean ± σ): 815.7 ms ± 34.4 ms [User: 567.6 ms, System: 289.2 ms]
Range (min … max): 757.6 ms … 857.3 ms 10 runs
Summary
'~/Proj/rust/cargo3/target/release/cargo check' ran
1.04 ± 0.05 times faster than '~/Proj/rust/cargo2/target/release/cargo check'
Sure, #8844. |
Avoid constructing an anyhow::Error when not necessary `anyhow::Error` always captures a backtrace when created, which is expensive. Split out of #8837
☔ The latest upstream changes (presumably #8844) made this pull request unmergeable. Please resolve the merge conflicts. Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:
|
/// however we need to immediately display the output to prevent a deadlock as the | ||
/// output messages are processed on the same thread as they are sent from. `output` | ||
/// defines where to output in this case. | ||
output: Option<&'a Config>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you include some notes that it is crucial that this is only set on the main thread? There are several concerns:
- The
RefMut
from shell can cause panics if used between threads. - The event loop is carefully crafted to avoid flickering of the progress bar. By splitting the output into different places, this makes it harder to ensure batches of messages are grouped together. I think the change is OK because it is all on the main thread, but it is pretty subtle.
- Keeping one thread responsible for output helps prevent interleaving of messages. In particular, some messages are not printed atomically (like the "status" messages).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Config
is not Sync
, so it isn't possible to assign it when not running on the main thread. Also because Shell
is wrapped in a RefCell
it isn't possible to borrow it twice at the same time to interleave output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Still, please add a comment discussing the concerns about the coordination of ownership of the output. My concern about interleaving is about the future, and things to watch out for if this is ever changed (like wrapping output in a mutex). It might also help to explain how JobState
works with respect to how it is constructed and passed into the job threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interleaving would still not be possible when the RefCell
is replaced with a Mutex
as the mutex would be locked for the duration of the printing of a single message. It would be hard to accidentally unlock the mutex in the middle of printing.
I can add some more docs to JobState
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
The overhead in doing so is often much higher than the actual time it takes to execute the job
This prevents a deadlock where the message queue is filled with output messages but not emptied as the job producing the messages runs on the same thread as the message processing.
Rebased |
📌 Commit 50c0221 has been approved by |
☀️ Test successful - checks-actions |
Update cargo 5 commits in d5556aeb8405b1fe696adb6e297ad7a1f2989b62..8662ab427a8d6ad8047811cc4d78dbd20dd07699 2020-11-04 22:20:36 +0000 to 2020-11-12 03:47:53 +0000 - Check if rust-src contains a vendor dir, and patch it in (rust-lang/cargo#8834) - Improve performance of almost fresh builds (rust-lang/cargo#8837) - Use u32/64::to/from_le_bytes instead of bit fiddling (rust-lang/cargo#8847) - Avoid constructing an anyhow::Error when not necessary (rust-lang/cargo#8844) - Skip extracting .cargo-ok files from packages (rust-lang/cargo#8835)
Update cargo 5 commits in d5556aeb8405b1fe696adb6e297ad7a1f2989b62..8662ab427a8d6ad8047811cc4d78dbd20dd07699 2020-11-04 22:20:36 +0000 to 2020-11-12 03:47:53 +0000 - Check if rust-src contains a vendor dir, and patch it in (rust-lang/cargo#8834) - Improve performance of almost fresh builds (rust-lang/cargo#8837) - Use u32/64::to/from_le_bytes instead of bit fiddling (rust-lang/cargo#8847) - Avoid constructing an anyhow::Error when not necessary (rust-lang/cargo#8844) - Skip extracting .cargo-ok files from packages (rust-lang/cargo#8835)
This currently saves about 15ms out of the 170ms Cargo overhead when compiling Bevy.
part of #8833
Benchmark results as of
602a1bd
Completely fresh:
(statistical outliers are consistently reproducible and don't happen for any other benchmarks)
Need to recompile single executable:
Benchmark results as of
ba49b13
Completely fresh:
(statistical outliers are consistently reproducible and don't happen for any other benchmarks)
Need to recompile single executable: